Keywords: digitalization, foreign trade activity, international trade, Uzbekistan regions
The goal of this study is to examine the impact of digitalization on the regions of the Republic of Uzbekistan, as well as the consequences for their foreign trade activity.
Li, Zhang, and Zhang (2023) investigates the influence of the digital economy on inter-regional trade within China in their article1. The authors employ a panel data analysis of China’s provinces from 2006 to 2017. Their key arguments are:
The study acknowledges that the digital economy’s impact on resource allocation and technological innovation in promoting inter-regional trade requires further exploration. Although this article analyses the impact of digitalization on inter-regional trade, methodology and findings is similar to this case.
At the practical application level, the digital economy is based on digital technology, and the use of information and communication technology, digital platforms, and blockchain technology may drastically lower trade costs2. Some researchers have also investigated the possibility of the digital economy to improve bilateral trade and greatly increase domestic trade flows by developing a comprehensive digital economy index3. Given the great degree of overlap between the Internet and digitalization, researchers have investigated the Internet’s influence on domestic and international trade. Furthermore, several studies have discovered that the digital economy drives greater growth in local commerce than foreign trade, and the potential for improving inter-regional connection remains enormous.
Compared to statistical approaches using cross-sectional data, panel data analysis allows researchers to achieve a substantially better level of statistical validity in data analysis and evaluation of models using more complex research designs. Panel data analysis has been utilized in many social science research papers and journal articles as a result of these benefits, and in recent years, academics have begun to employ it more and more.
At its core, panel data allows us to control for unobserved heterogeneity – persistent characteristics of individuals that aren’t directly captured by the data but can influence the dependent variable. For example, in this case, regions have their own unique characterstics that have impact on foreign trade turnover, such as geographical location, culture. Panel data helps address this by including an individual fixed effect (α_i) in the model. This effect captures all the time-invariant characteristics of individual ‘i’ that influence income.
As noted by Baltagi (2005)4, one of the fixed effects model’s main advantages is its capacity to account for bias resulting from missing variables. It is possible to effectively handle unobserved heterogeneity, which might potentially obscure the connection between variables in a simple regression.
The mathematical representation of the fixed effects model:
\[Y_it = α_i + \beta X_it + ε_it\]
But there are also drawbacks to the model. Wooldridge (2010)5 asserts that because time-invariant independent factors are differenced out along with the individual fixed effect, fixed effects models are unable to evaluate the impact of these variables.
Despite these drawbacks, fixed effects models have numerous usages in many different industries. When examining the effect of work training on earnings, labor economists utilize them to account for individual abilities. In a similar way, political scientists may use them to examine how various policies affect voting patterns while taking into consideration unreported regional variables.
There is more to panel data theory than just fixed effects. The random effects model, which presupposes that each individual impact is random and uncorrelated with the independent variables, was developed by researchers such as Arellano and Bond (1991)6. This makes it possible to estimate the impacts of independent variables that change over time as well as those that don’t.
The dataset is constructed by the author based on the statistical data from Stat.uz (STATISTICS AGENCY UNDER THE PRESIDENT OF THE REPUBLIC OF UZBEKISTAN). Panel dataset consists of cross-sectional data, including economic data of Uzbekistan’s regions from 2015 to 2022.
There is a total of 112 observations on 14 regions and 12 variables.
This panel dataset is balanced and it has these variables:
GCI - Growth rates of the volume of communication and information services by region (percentage) ~ Source
IS - Number of subscribers with Internet access (thousand) ~ Source
APC - Information about the availability of personal computers (excluding servers) in enterprises and organizations (units) ~ Source
NCI - Number of computers connected to the Internet at enterprises and organizations (units) ~ Source
IA - The share of enterprises and organizations with access to the Internet ~ Source
AW - Average monthly nominal wages of employees of legal entities whose main economic activity is “Information and Communication” ~ Source
TT - Total foreign trade turnover (mln US dollars, Export + Import)
TB - Trade balance (mln US dollars, Export - Import)
All variables are quantitative.
library(corrplot)
library(readxl)
mydata <- read_excel("./DATA/data.xlsx",
sheet = "Main")
head(mydata)
## # A tibble: 6 × 18
## Region Year Export Import GCI IS APC NCI IA AW TT TB
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 The Repub… 2015 95.4 191. 116 501. 32266 9372 22.2 1133. 286. -95.3
## 2 The Repub… 2016 428. 200. 106. 526. 36322 10692 23.9 1276. 628. 228.
## 3 The Repub… 2017 462. 179. 116. 607. 41125 12594 25 1553. 641. 284.
## 4 The Repub… 2018 458. 284. 105. 768. 45483 15431 24.9 1949. 742 174.
## 5 The Repub… 2019 431. 269. 112. 922. 50945 17788 24.7 2322. 700 162.
## 6 The Repub… 2020 363. 176. 108. 1072. 53251 21882 19.7 2437 539 186.
## # ℹ 6 more variables: TNRS <dbl>, NRS <dbl>, NRSI <dbl>, SHPC <dbl>,
## # SHLN <dbl>, NCLN <dbl>
str(mydata)
## tibble [112 × 18] (S3: tbl_df/tbl/data.frame)
## $ Region: chr [1:112] "The Republic of Karakalpakstan" "The Republic of Karakalpakstan" "The Republic of Karakalpakstan" "The Republic of Karakalpakstan" ...
## $ Year : num [1:112] 2015 2016 2017 2018 2019 ...
## $ Export: num [1:112] 95.4 428.3 462.4 457.8 431.1 ...
## $ Import: num [1:112] 191 200 179 284 269 ...
## $ GCI : num [1:112] 116 106 116 105 112 ...
## $ IS : num [1:112] 501 526 607 768 922 ...
## $ APC : num [1:112] 32266 36322 41125 45483 50945 ...
## $ NCI : num [1:112] 9372 10692 12594 15431 17788 ...
## $ IA : num [1:112] 22.2 23.9 25 24.9 24.7 ...
## $ AW : num [1:112] 1133 1276 1553 1949 2322 ...
## $ TT : num [1:112] 286 628 641 742 700 ...
## $ TB : num [1:112] -95.3 228.2 283.7 173.6 162.2 ...
## $ TNRS : num [1:112] 1203 1202 1296 1263 1332 ...
## $ NRS : num [1:112] 67.7 66.6 70.8 68 70.7 72.1 73.7 76.4 60.6 61.5 ...
## $ NRSI : num [1:112] 66.2 65.4 69.6 66.6 68.6 69.9 70.2 71.9 59.7 60.7 ...
## $ SHPC : num [1:112] 40.6 50.9 53.9 59.9 64.7 72.2 54.2 36.2 34.6 37.7 ...
## $ SHLN : num [1:112] 4 4.5 4.9 4.6 4.5 2.9 2.5 5.7 1.8 1.9 ...
## $ NCLN : num [1:112] 10864 12260 14379 16216 18852 ...
Export,
TBGCI,
IS, APC,NCI, TT,
AW, IACategorical (qualitative, nominal): -
Categorical (qualitative, ordinal): Region, Year
summary(mydata)
## Region Year Export Import
## Length:112 Min. :2015 Min. : 76.2 Min. : 50.9
## Class :character 1st Qu.:2017 1st Qu.: 198.9 1st Qu.: 285.6
## Mode :character Median :2018 Median : 313.9 Median : 532.7
## Mean :2018 Mean : 617.8 Mean : 1350.7
## 3rd Qu.:2020 3rd Qu.: 509.9 3rd Qu.: 1247.5
## Max. :2022 Max. :4669.3 Max. :14939.7
## GCI IS APC NCI
## Min. :104.3 Min. : 212.4 Min. : 17331 Min. : 3961
## 1st Qu.:109.9 1st Qu.: 542.1 1st Qu.: 36562 1st Qu.: 10798
## Median :114.2 Median : 874.2 Median : 50846 Median : 16408
## Mean :116.1 Mean :1147.6 Mean : 68217 Mean : 28802
## 3rd Qu.:121.1 3rd Qu.:1380.4 3rd Qu.: 62679 3rd Qu.: 26039
## Max. :143.1 Max. :5385.0 Max. :381051 Max. :236242
## IA AW TT TB
## Min. : 7.60 Min. : 943 Min. : 158.3 Min. :-10270.40
## 1st Qu.:17.20 1st Qu.:1341 1st Qu.: 554.6 1st Qu.: -743.48
## Median :20.05 Median :2107 Median : 852.2 Median : -243.15
## Mean :21.41 Mean :2370 Mean : 1968.5 Mean : -732.95
## 3rd Qu.:24.93 3rd Qu.:2942 3rd Qu.: 1624.4 3rd Qu.: -52.15
## Max. :44.50 Max. :9149 Max. :19609.0 Max. : 283.70
## TNRS NRS NRSI SHPC
## Min. : 523.6 Min. : 20.20 Min. : 19.10 Min. :15.00
## 1st Qu.:1055.3 1st Qu.: 59.15 1st Qu.: 57.60 1st Qu.:38.02
## Median :1470.8 Median : 66.05 Median : 64.45 Median :47.15
## Mean :1760.8 Mean : 74.68 Mean : 70.12 Mean :47.73
## 3rd Qu.:2075.1 3rd Qu.: 74.15 3rd Qu.: 70.12 3rd Qu.:57.40
## Max. :7691.9 Max. :264.40 Max. :197.20 Max. :79.70
## SHLN NCLN
## Min. : 1.300 Min. : 5777
## 1st Qu.: 3.175 1st Qu.: 13314
## Median : 4.500 Median : 17414
## Mean : 4.507 Mean : 28234
## 3rd Qu.: 5.525 3rd Qu.: 22024
## Max. :11.300 Max. :233004
Summary:
Year - Dataset provides information from 2015 to 2022
Export - The minimum export value of 14 regions is 75.2 mln US dollars. And on average, regions exported products worth of 617.8 million. But more than more than 50% of regions indicated 313.9 million US dollar export, which means there are potential outliers in the dataset
Import - Mean value of import shows that regions imported more products than they export on average (1350.7 million US dollars).
Growth rates in communication and information services (GCI) vary between 104.3% and 143.1%, with a mean growth rate of 116.1%.
IS - The number of internet subscribers shows substantial variation, with a minimum of 212.4 thousand and a maximum of 5385.0 thousand. The mean number of internet subscribers is 1147.6 thousand.
APC - The minimum APC is 17331 units, while the maximum is 381051 units.
NCI - The range for NCI is from a minimum of 3961 units to a maximum of 236242 units.
TT - ranges from 158.3 to 19609.0 million US dollars, with a mean of 1968.5 million US dollars.
TB - ranges from -10270.40 to 283.70 million US dollars, with a mean trade deficit of -732.95 million US dollars. A negative trade balance suggests a higher value of imports than exports on average in these regions
# Boxplot for Export and Import
boxplot(mydata$Export, mydata$Import, names = c("Export", "Import"), col = c("skyblue", "lightcoral"), main = "Boxplot of Export and Import")
Boxplot shows that there are many outlier regions in both export and
import idicators and median tends to lower side of the candle.
boxplot(mydata$AW, mydata$IS, names = c("Average monthly wages", "Number of subscribers with Internet access"), col = c("skyblue", "lightcoral"), main = "Boxplot of AW and IS")
Correlation between two variables reveals if they are significantly correlated and provides information about the strength and direction of the linear relationship between them.
The correlation coefficient (r) can range from -1 to 1.
If r = 1, it means there is a complete positive correlation between
variables.
If r equals 0, there is no linear connection between the two variables.
If r = -1, then there is a complete negative correlation between the
variables.
corr <- cor(mydata[c(-1, -2)])
corr
## Export Import GCI IS APC NCI
## Export 1.0000000 0.9464173 0.43072055 0.7820418 0.9137079 0.9126492
## Import 0.9464173 1.0000000 0.35501911 0.8204491 0.9395345 0.9528416
## GCI 0.4307205 0.3550191 1.00000000 0.3961613 0.3286583 0.3782498
## IS 0.7820418 0.8204491 0.39616131 1.0000000 0.8789623 0.8878817
## APC 0.9137079 0.9395345 0.32865826 0.8789623 1.0000000 0.9862589
## NCI 0.9126492 0.9528416 0.37824976 0.8878817 0.9862589 1.0000000
## IA 0.4429925 0.4170457 -0.10351610 0.2056495 0.4978939 0.4693396
## AW 0.6200894 0.6694345 0.53166658 0.7973131 0.6369413 0.7226999
## TT 0.9703829 0.9964019 0.37860103 0.8189549 0.9425880 0.9522642
## TB -0.8790801 -0.9859167 -0.30096647 -0.8060181 -0.9136071 -0.9337969
## TNRS 0.7873466 0.8432048 0.29553383 0.9377916 0.9126194 0.8931389
## NRS 0.7746907 0.8427551 0.25744768 0.7914671 0.8946952 0.9159824
## NRSI 0.6884985 0.7294981 0.18039578 0.7160722 0.8410490 0.8427889
## SHPC 0.1064967 0.1351762 -0.36378357 0.1473343 0.2642021 0.1826870
## SHLN 0.3861876 0.4206250 0.02493716 0.2659371 0.4742133 0.4843789
## NCLN 0.9074301 0.9423575 0.32067204 0.8486574 0.9870688 0.9869327
## IA AW TT TB TNRS NRS
## Export 0.44299253 0.62008938 0.9703829 -0.8790801 0.7873466 0.7746907
## Import 0.41704572 0.66943448 0.9964019 -0.9859167 0.8432048 0.8427551
## GCI -0.10351610 0.53166658 0.3786010 -0.3009665 0.2955338 0.2574477
## IS 0.20564953 0.79731310 0.8189549 -0.8060181 0.9377916 0.7914671
## APC 0.49789394 0.63694131 0.9425880 -0.9136071 0.9126194 0.8946952
## NCI 0.46933963 0.72269994 0.9522642 -0.9337969 0.8931389 0.9159824
## IA 1.00000000 0.02105853 0.4282192 -0.3861638 0.2757321 0.5338615
## AW 0.02105853 1.00000000 0.6634895 -0.6669847 0.6327353 0.6378146
## TT 0.42821917 0.66348946 1.0000000 -0.9681954 0.8373688 0.8337110
## TB -0.38616379 -0.66698466 -0.9681954 1.0000000 -0.8368587 -0.8427488
## TNRS 0.27573209 0.63273533 0.8373688 -0.8368587 1.0000000 0.8303386
## NRS 0.53386153 0.63781462 0.8337110 -0.8427488 0.8303386 1.0000000
## NRSI 0.59758190 0.51350612 0.7263718 -0.7202141 0.7602879 0.9624468
## SHPC 0.54339374 -0.12751777 0.1290641 -0.1443733 0.2115460 0.1704800
## SHLN 0.57457595 0.16881941 0.4159888 -0.4208632 0.3501945 0.4648262
## NCLN 0.51627327 0.64165712 0.9430521 -0.9210248 0.8897700 0.9266619
## NRSI SHPC SHLN NCLN
## Export 0.6884985 0.1064967 0.38618759 0.9074301
## Import 0.7294981 0.1351762 0.42062501 0.9423575
## GCI 0.1803958 -0.3637836 0.02493716 0.3206720
## IS 0.7160722 0.1473343 0.26593705 0.8486574
## APC 0.8410490 0.2642021 0.47421334 0.9870688
## NCI 0.8427889 0.1826870 0.48437893 0.9869327
## IA 0.5975819 0.5433937 0.57457595 0.5162733
## AW 0.5135061 -0.1275178 0.16881941 0.6416571
## TT 0.7263718 0.1290641 0.41598877 0.9430521
## TB -0.7202141 -0.1443733 -0.42086321 -0.9210248
## TNRS 0.7602879 0.2115460 0.35019453 0.8897700
## NRS 0.9624468 0.1704800 0.46482618 0.9266619
## NRSI 1.0000000 0.2197228 0.43891184 0.8672869
## SHPC 0.2197228 1.0000000 0.15434208 0.1926237
## SHLN 0.4389118 0.1543421 1.00000000 0.5454536
## NCLN 0.8672869 0.1926237 0.54545358 1.0000000
corrplot(corr, order = "FPC")
# Ensure the dataframe columns are numeric for the correlation calculation
numeric_data <- mydata[sapply(mydata, is.numeric)]
# Reorder the columns to move TT, TB, Export, and Import to the start
required_cols <- c("TT", "TB", "Export", "Import")
remaining_cols <- setdiff(names(numeric_data), required_cols)
mydata_reordered <- numeric_data[, c(required_cols, remaining_cols)]
# Calculate the correlation matrix
corr <- cor(mydata_reordered, use = "complete.obs")
# Create the correlation plot with hierarchical clustering order
corrplot(corr)
Apart from GCI, all variables significantly correlated with TT, but there is multicollinearity problem with these variables. To fix it let’s create single Digitalization index which can explain the dataset.
library(factoextra)
library(tidyverse)
pca_data <- mydata %>% select(GCI, IS, APC, NCI, AW, IA)
# PCA
pca_result <- prcomp(pca_data, scale = TRUE)
pca_result2 <- princomp(pca_data, cor = TRUE)
summary(pca_result2)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 1.954070 1.1326041 0.75789785 0.4891632 0.28306936
## Proportion of Variance 0.636398 0.2137987 0.09573486 0.0398801 0.01335471
## Cumulative Proportion 0.636398 0.8501967 0.94593151 0.9858116 0.99916632
## Comp.6
## Standard deviation 0.0707252098
## Proportion of Variance 0.0008336759
## Cumulative Proportion 1.0000000000
summary(pca_result)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 1.9541 1.1326 0.75790 0.48916 0.28307 0.07073
## Proportion of Variance 0.6364 0.2138 0.09573 0.03988 0.01335 0.00083
## Cumulative Proportion 0.6364 0.8502 0.94593 0.98581 0.99917 1.00000
fviz_eig(pca_result, addlabels = TRUE)
#fviz_cos2(pca_result, choice = "var", axes = 1:3)
The table of Importance of components and Scree plot indicate that PC1 can explain 63% of the dataset so this component is can be used as Digitalization index.
principal_components <- pca_result$x
selected_components <- principal_components[, 1]
pca_scores <- selected_components %*% t(principal_components[, 1])
# Combine components to create the PCA index
pca_index <- rowSums(pca_scores)
min_range <- 0
max_range <- 1
scaled_pca_index <- scales::rescale(pca_index, to = c(min_range, max_range))
nd <- mydata
nd$Digix <- scaled_pca_index
#tail(nd[c(-4, -3)], 20)
corm <- cor(nd[c(-1, -2)]); corm
## Export Import GCI IS APC NCI
## Export 1.0000000 0.9464173 0.43072055 0.7820418 0.9137079 0.9126492
## Import 0.9464173 1.0000000 0.35501911 0.8204491 0.9395345 0.9528416
## GCI 0.4307205 0.3550191 1.00000000 0.3961613 0.3286583 0.3782498
## IS 0.7820418 0.8204491 0.39616131 1.0000000 0.8789623 0.8878817
## APC 0.9137079 0.9395345 0.32865826 0.8789623 1.0000000 0.9862589
## NCI 0.9126492 0.9528416 0.37824976 0.8878817 0.9862589 1.0000000
## IA 0.4429925 0.4170457 -0.10351610 0.2056495 0.4978939 0.4693396
## AW 0.6200894 0.6694345 0.53166658 0.7973131 0.6369413 0.7226999
## TT 0.9703829 0.9964019 0.37860103 0.8189549 0.9425880 0.9522642
## TB -0.8790801 -0.9859167 -0.30096647 -0.8060181 -0.9136071 -0.9337969
## TNRS 0.7873466 0.8432048 0.29553383 0.9377916 0.9126194 0.8931389
## NRS 0.7746907 0.8427551 0.25744768 0.7914671 0.8946952 0.9159824
## NRSI 0.6884985 0.7294981 0.18039578 0.7160722 0.8410490 0.8427889
## SHPC 0.1064967 0.1351762 -0.36378357 0.1473343 0.2642021 0.1826870
## SHLN 0.3861876 0.4206250 0.02493716 0.2659371 0.4742133 0.4843789
## NCLN 0.9074301 0.9423575 0.32067204 0.8486574 0.9870688 0.9869327
## Digix 0.8839045 0.9066404 0.51109787 0.9347378 0.9365947 0.9640342
## IA AW TT TB TNRS NRS
## Export 0.44299253 0.62008938 0.9703829 -0.8790801 0.7873466 0.7746907
## Import 0.41704572 0.66943448 0.9964019 -0.9859167 0.8432048 0.8427551
## GCI -0.10351610 0.53166658 0.3786010 -0.3009665 0.2955338 0.2574477
## IS 0.20564953 0.79731310 0.8189549 -0.8060181 0.9377916 0.7914671
## APC 0.49789394 0.63694131 0.9425880 -0.9136071 0.9126194 0.8946952
## NCI 0.46933963 0.72269994 0.9522642 -0.9337969 0.8931389 0.9159824
## IA 1.00000000 0.02105853 0.4282192 -0.3861638 0.2757321 0.5338615
## AW 0.02105853 1.00000000 0.6634895 -0.6669847 0.6327353 0.6378146
## TT 0.42821917 0.66348946 1.0000000 -0.9681954 0.8373688 0.8337110
## TB -0.38616379 -0.66698466 -0.9681954 1.0000000 -0.8368587 -0.8427488
## TNRS 0.27573209 0.63273533 0.8373688 -0.8368587 1.0000000 0.8303386
## NRS 0.53386153 0.63781462 0.8337110 -0.8427488 0.8303386 1.0000000
## NRSI 0.59758190 0.51350612 0.7263718 -0.7202141 0.7602879 0.9624468
## SHPC 0.54339374 -0.12751777 0.1290641 -0.1443733 0.2115460 0.1704800
## SHLN 0.57457595 0.16881941 0.4159888 -0.4208632 0.3501945 0.4648262
## NCLN 0.51627327 0.64165712 0.9430521 -0.9210248 0.8897700 0.9266619
## Digix 0.37652597 0.83461014 0.9101608 -0.8804884 0.8888484 0.8708837
## NRSI SHPC SHLN NCLN Digix
## Export 0.6884985 0.1064967 0.38618759 0.9074301 0.8839045
## Import 0.7294981 0.1351762 0.42062501 0.9423575 0.9066404
## GCI 0.1803958 -0.3637836 0.02493716 0.3206720 0.5110979
## IS 0.7160722 0.1473343 0.26593705 0.8486574 0.9347378
## APC 0.8410490 0.2642021 0.47421334 0.9870688 0.9365947
## NCI 0.8427889 0.1826870 0.48437893 0.9869327 0.9640342
## IA 0.5975819 0.5433937 0.57457595 0.5162733 0.3765260
## AW 0.5135061 -0.1275178 0.16881941 0.6416571 0.8346101
## TT 0.7263718 0.1290641 0.41598877 0.9430521 0.9101608
## TB -0.7202141 -0.1443733 -0.42086321 -0.9210248 -0.8804884
## TNRS 0.7602879 0.2115460 0.35019453 0.8897700 0.8888484
## NRS 0.9624468 0.1704800 0.46482618 0.9266619 0.8708837
## NRSI 1.0000000 0.2197228 0.43891184 0.8672869 0.7840094
## SHPC 0.2197228 1.0000000 0.15434208 0.1926237 0.1155613
## SHLN 0.4389118 0.1543421 1.00000000 0.5454536 0.4035076
## NCLN 0.8672869 0.1926237 0.54545358 1.0000000 0.9308264
## Digix 0.7840094 0.1155613 0.40350758 0.9308264 1.0000000
corrplot(corm)
Based on the correlation matrix above, it is clear that the index we created has significant relation (91%) with Trade turnover of the region
library(ggplot2)
# Calculate mean Digix value for each region
region_digix <- aggregate(Digix ~ Region, data = nd, FUN = mean)
# Plotting the digitalization index over years for different regions
ggplot(data=nd, aes(x = Year, y = Digix, color = Region)) +
geom_line() +
labs(x = "Year", y = "Digitalization Index", title = "Digitalization Index over Time by Region") +
theme_minimal()
ggplot(nd, aes(x = Digix, y = Export)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Scatter Plot of Digix and Export",
x = "Digix",
y = "Export") +
theme_minimal()
boxplot(TT~Year, data = nd)
library(plm)
pdata <- pdata.frame(nd, index = c("Region", "Year"))
# Scatter plot for each region (TT)
ggplot(pdata, aes(x = Digix, y = TT, color = as.factor(Region))) +
geom_point() +
labs(title = "Scatter Plot of TT vs Digix by Region",
x = "Digix",
y = "TT")
# Scatter plot for each region (Export)
ggplot(pdata, aes(x = Digix, y = Export, color = as.factor(Region))) +
geom_point() +
labs(title = "Scatter Plot of Export vs Digix by Region",
x = "Digix",
y = "Export")
# Scatter plot for each region (Export)
ggplot(pdata, aes(x = Digix, y = Import, color = as.factor(Region))) +
geom_point() +
labs(title = "Scatter Plot of Import vs Digix by Region",
x = "Digix",
y = "Import")
library(gplots)
plotmeans(TT~Region , main = 'Heterogeneity Across Regions', data = pdata)
plotmeans(TT~Year , main = 'Heterogeneity Across Years', data = pdata)
These graphs suggest that heterogeneity differs between distinct
units.
\(H_0\): Pooled OLS is not the appropriate estimator
\(H_1\): Pooled OLS is the appropriate estimator
pooledOls <- plm(TT ~ Digix, data = pdata , model = "pooling")
summary(pooledOls)
## Pooling Model
##
## Call:
## plm(formula = TT ~ Digix, data = pdata, model = "pooling")
##
## Balanced Panel: n = 14, T = 8, N = 112
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -3712.148 -559.065 68.143 721.669 3995.789
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -914.80 177.35 -5.1581 1.113e-06 ***
## Digix 18653.68 809.50 23.0434 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1134100000
## Residual Sum of Squares: 194620000
## R-Squared: 0.82839
## Adj. R-Squared: 0.82683
## F-statistic: 530.998 on 1 and 110 DF, p-value: < 2.22e-16
\[TT_i = \beta_0 + \beta_1 \cdot Digix_{it} + u_{it}\] \[ TT = −914.80 + 18653.68 \cdot Digix_{it} + u_{it}\]
Summary:
Pooled OLS ignores heterogeneity totally, therefore even if it has a P-Value of less than 5% and R2, showing high overall accuracy and suitability to use, we cannot rely on this model.
# Scatter plot for each region
ggplot(pdata, aes(x = Digix, y = TT, color = as.factor(Region))) +
geom_point() +
labs(title = "Scatter Plot of TT vs Digix by Region",
x = "Digix",
y = "TT")
fixed.dum <-lm(TT ~ Digix + factor(Region)-1, data = pdata)
summary(fixed.dum)
##
## Call:
## lm(formula = TT ~ Digix + factor(Region) - 1, data = pdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3386.9 -252.8 64.6 373.3 3681.4
##
## Coefficients:
## Estimate Std. Error t value
## Digix 10460.03 1100.96 9.501
## factor(Region)Andijan region 1215.25 338.30 3.592
## factor(Region)Bukhara region -265.52 341.05 -0.779
## factor(Region)Fergana region -231.08 357.73 -0.646
## factor(Region)Jizzakh region -429.10 327.24 -1.311
## factor(Region)Kashkadarya region -630.22 342.79 -1.838
## factor(Region)Khorezm region -755.59 337.52 -2.239
## factor(Region)Namangan region -328.42 337.79 -0.972
## factor(Region)Navoi region -26.23 339.15 -0.077
## factor(Region)Samarkand region -259.60 365.86 -0.710
## factor(Region)Surkhandarya region -529.35 329.62 -1.606
## factor(Region)Syrdarya region -157.13 321.28 -0.489
## factor(Region)Tashkent city 5467.59 761.54 7.180
## factor(Region)Tashkent region 2390.05 366.45 6.522
## factor(Region)The Republic of Karakalpakstan -536.92 336.32 -1.596
## Pr(>|t|)
## Digix 1.61e-15 ***
## factor(Region)Andijan region 0.000517 ***
## factor(Region)Bukhara region 0.438148
## factor(Region)Fergana region 0.519839
## factor(Region)Jizzakh region 0.192865
## factor(Region)Kashkadarya region 0.069053 .
## factor(Region)Khorezm region 0.027462 *
## factor(Region)Namangan region 0.333341
## factor(Region)Navoi region 0.938506
## factor(Region)Samarkand region 0.479685
## factor(Region)Surkhandarya region 0.111540
## factor(Region)Syrdarya region 0.625882
## factor(Region)Tashkent city 1.42e-10 ***
## factor(Region)Tashkent region 3.14e-09 ***
## factor(Region)The Republic of Karakalpakstan 0.113645
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 889.2 on 97 degrees of freedom
## Multiple R-squared: 0.9511, Adjusted R-squared: 0.9435
## F-statistic: 125.7 on 15 and 97 DF, p-value: < 2.2e-16
Here each region is regarded as a dummy variable for estimation.
Overall model is good as the p-value is less than 5%.
Different regions contribute differently to Total Foreign Trade Turnover, as reflected by the fixed effects. Andijan, Khorezm, Tashkent City, and Tashkent Region, compared to others, have significant impact on Foreign Trade Turnover.
Multiple R-squared is 95% which is also more accurate than pooled OLS.
library(car)
yhat <- fixed.dum$fitted
Region <- pdata$Region
scatterplot(yhat ~ pdata$Digix| Region, xlab= "Regions", ylab ="yhat", smoother = quantregLine, legend = list( cex = 0.56, lty = 2))
abline(lm(TT~Digix, data = pdata),lwd=3, col="red")
This graph includes regression lines for 14 regression models of each region. The redline is the total regression line of LSVD model.
The complexity of LSDV increases with the number of parameters, including dummy variables, leading to less efficient estimators. That’s why we need to continue with next model.
fixed <- plm(TT ~ Digix, data =pdata, index=c("Region", "Year"), model ="within")
summary(fixed)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = TT ~ Digix, data = pdata, model = "within", index = c("Region",
## "Year"))
##
## Balanced Panel: n = 14, T = 8, N = 112
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -3386.874 -252.766 64.561 373.296 3681.379
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Digix 10460 1101 9.5008 1.605e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 148080000
## Residual Sum of Squares: 76701000
## R-Squared: 0.48202
## Adj. R-Squared: 0.40726
## F-statistic: 90.2651 on 1 and 97 DF, p-value: 1.6054e-15
P-Value: Overall model is good as the p-value less than 5%
The intercept values are significant as the p-value is less than 5%.
R2: Value of R2 is 0.48202, which is less than pooled OLS and LSVD, but more accurate than them.
fixef(fixed)
## Andijan region Bukhara region
## 1215.251 -265.523
## Fergana region Jizzakh region
## -231.078 -429.099
## Kashkadarya region Khorezm region
## -630.215 -755.593
## Namangan region Navoi region
## -328.420 -26.232
## Samarkand region Surkhandarya region
## -259.598 -529.353
## Syrdarya region Tashkent city
## -157.134 5467.588
## Tashkent region The Republic of Karakalpakstan
## 2390.049 -536.920
Here intercept values of each region is provided. They are similar to the values we got from LSVD model.
random <- plm(TT ~ Digix, data=pdata, index=c("Region", "Year"), model="random")
summary(random)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = TT ~ Digix, data = pdata, model = "random", index = c("Region",
## "Year"))
##
## Balanced Panel: n = 14, T = 8, N = 112
##
## Effects:
## var std.dev share
## idiosyncratic 790727.6 889.2 0.6
## individual 527899.0 726.6 0.4
## theta: 0.6029
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2619.417 -409.508 64.639 478.480 4564.810
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) -201.25 286.61 -0.7022 0.4826
## Digix 14037.34 1019.04 13.7751 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 303580000
## Residual Sum of Squares: 111400000
## R-Squared: 0.63303
## Adj. R-Squared: 0.6297
## Chisq: 189.754 on 1 DF, p-value: < 2.22e-16
P-Value: Overall model is good as the p-value less than 5%
The Digix value is significant as the p-value is less than 5%, but intercept value has low significance
The value of R2 is 0.91129 and 0.63303, which is better than Fixed effects using within model
Breusch-Pagan (BP) tes (Graphical)
library(lmtest)
bptest(TT ~ Digix+factor(Region)+factor(Year), data=pdata, studentize=F)
##
## Breusch-Pagan test
##
## data: TT ~ Digix + factor(Region) + factor(Year)
## BP = 268.71, df = 21, p-value < 2.2e-16
We may reject the null hypothesis since the BP test’s P-value of less than 5% shows that the residual’s variance is changing as the predictor value rises. As a result, heteroscedasticity exists in the data.
Pooled OLS vs Fixed Effect Model
H0: Pooled OLS model is consistent
H1: Fixed effect model is consistent
pFtest(fixed,pooledOls)
##
## F test for individual effects
##
## data: TT ~ Digix
## F = 11.471, df1 = 13, df2 = 97, p-value = 1.545e-14
## alternative hypothesis: significant effects
The result of pFtest () reveals that the p-value is less than 5%, allowing us to reject the null hypothesis. Therefore, the fixed effect model is consistent.
Hausman Test
H0: Random Effect is consistent.
H1: Fixed Effect is consistent.
phtest(fixed, random)
##
## Hausman Test
##
## data: TT ~ Digix
## chisq = 73.678, df = 1, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent
Here, the P-value is less than 5%. It suggests that there is significant evidence against the null hypothesis, with less than a 5% chance that it is right. As a result, the fixed effect model is an appropriate estimator.
Two tests above proved that fixed effects model is best estimator, so let’s check for time-fixed effects
fixed.time <- plm(TT ~ Digix + factor(Year), data=pdata, index=c("Region",
"Year"), model="within")
#summary(fixed.time)
pFtest(fixed.time, fixed)
##
## F test for individual effects
##
## data: TT ~ Digix + factor(Year)
## F = 4.5119, df1 = 7, df2 = 90, p-value = 0.0002392
## alternative hypothesis: significant effects
plmtest(fixed, c("time"), type=("bp"))
##
## Lagrange Multiplier Test - time effects (Breusch-Pagan)
##
## data: TT ~ Digix
## chisq = 54.798, df = 1, p-value = 1.336e-13
## alternative hypothesis: significant effects
Both tests show lower p than alpha, which means we should add time-fixed effects to the model
Based one results of tests above we construct a regression model with Time Fixed Effects here:
\[Y_{it} = \beta_0 + \beta_1 X_{it} + \delta_2 Dummy2 + \delta_T DummyT_t + u_{it}\] This model removes omitted variable bias by removing unobserved variables that change over time but remain constant between entities.
fm <- plm(TT ~ Digix + factor(Year) - 1, data =pdata, index=c("Region", "Year"), model ="within", effect = "twoways")
summary(fm)
## Twoways effects Within Model
##
## Call:
## plm(formula = TT ~ Digix + factor(Year) - 1, data = pdata, effect = "twoways",
## model = "within", index = c("Region", "Year"))
##
## Balanced Panel: n = 14, T = 8, N = 112
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -3256.28 -323.53 34.96 355.82 3167.62
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Digix 19959.0 2122.2 9.4047 4.946e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 112570000
## Residual Sum of Squares: 56776000
## R-Squared: 0.49565
## Adj. R-Squared: 0.37797
## F-statistic: 88.4481 on 1 and 90 DF, p-value: 4.9456e-15
#tlm <- lm(TT ~ Digix + Region + factor(Year) - 1, data =pdata)
#summary(tlm)
The final model is: \[TT_{it} = \beta_1
\cdot Digix_{it} + \alpha_i + \gamma_i + u_{it}\] * \(TT_{it}\) - it is the dependent variable
for region i at time t.
* \(\beta_1\) - the coefficient for the
variable \(Digix_{it}\)
* \(\alpha_i\) - represents the fixed
effect for region i.
* \(\gamma_i\) - represents the fixed
effect for time t.
* \(u_{it}\) - the error term.
\[TT_{it} = 19959.0 \cdot Digix_{it} + \alpha_i + \gamma_i + u_{it}\] The estimated fixed effects for each region are:
fixef(fm)
## Andijan region Bukhara region
## 973.75 -569.83
## Fergana region Jizzakh region
## -867.43 -376.28
## Kashkadarya region Khorezm region
## -972.70 -978.66
## Namangan region Navoi region
## -558.02 -287.40
## Samarkand region Surkhandarya region
## -1037.76 -547.66
## Syrdarya region Tashkent city
## 108.26 319.41
## Tashkent region The Republic of Karakalpakstan
## 1601.92 -731.40
Summary - The panel data analysis with two-way fixed effects provides useful insights into the economic variables that has influence on the dependent variable (TT) across different locations and time.
The equation (\(TT_{it} = 19959.0 \times Digix_{it} + \alpha_i + \gamma_t + u_{it}\)) shows a strong positive correlation (91%) between the dependent variable (TT) and the digitalization. The estimated coefficient for (Digix) is 19959.0, which is statistically significant (p < 0.001).
This coefficient shows how one unit rise in the Digitalization Index (Digix) is expected to impact on region’s Foreign Trade Turnover (TT). In other terms, Foreign Trade Turnover of each region will rise by 19,959 million US dollars for every unit improvement in digitization. This indicates that the economic operations associated with international trade within the region benefits greatly from digitalization. Therefore, regions with higher digitalization are likely to witness a considerable boost in their trade activities.
The fixed effects provide these additional details:
The Tashkent region has a relatively higher fixed effect (1601.92), hence it has a positive impact on the dependent variable beyond the digitalization, and the reason of this might be because Tashkent region includes the capital of Uzbekistan and it is more developed than other regions.
Some regions, such as Fergana, Kashkadarya, and Samarkand, have negative fixed effects. They may experience lower (TT) in the overall trend of (Digix).
The (\(R^2\)) value of 0.49565 suggests that the model explains approximately 49.6% of the data.
In conclusion, the results indicate the importance of digitaliation (Digix) in foreign trade turnover of regions. This model can be considered while developing strategies to enhance the economic development and well-being of these areas. However, further research may be needed to address the specific challenges and opportunities of each region.
Future of the study
Although this study identifies a regional variation in the digital economy that affect foreign trade turnover of regions, there are significant limitations. Firstly, there is a need for more research because the current studies have not established a common framework to measure the digital economy and there might be discrepancies in the model. Secondly, the assessment of model was established exclusively using the data, which is limited to Uzbekistan’s regions, and may be less generalizable to other countries. This study’s empirical examination of the digital economy’s impact on inter-regional commerce is conducted at the macro level, and how it might be adapted to the more particular city level is an area for future research.
library (cluster) # кластерный анализ
library(GGally)
library(plotly)
# Remove non-numeric columns for clustering
maind <- mydata %>% filter(Year == 2022)
cd <- maind %>% select(GCI, IS, APC, NCI, IA, AW)
scaled_dt1 <- scale(cd)
fviz_nbclust(scaled_dt1, kmeans, method = "wss")
Perform K-means clustering:
library(ggrepel)
library(tidyr)
library(geojsonio)
library(tmaptools)
library(tmap)
library(sf)
library(leaflet)
# Data frame with number of clusters per year
yed <- data.frame(
Year = 2015:2022,
KC = c(3, 3, 3, 3, 3, 4, 3, 3)
)
# Loop through unique years
unique_years <- unique(mydata$Year)
for (year in unique_years) {
# Subset data for the current year
md <- mydata %>% filter(Year == year) %>% select(Year, GCI, IS, APC, NCI, IA, AW, TNRS, NRS, NRSI, SHPC, SHLN, NCLN)
year_data <- mydata %>% filter(Year == year) %>% select(GCI, IS, APC, NCI, IA, AW, TNRS, NRS, NRSI, SHPC, SHLN, NCLN)
# Check if data exists for the current year
if (nrow(year_data) > 0) {
# Scale the data
scaled_dt <- scale(year_data)
# Determine the number of clusters for the current year
#ok <- fviz_nbclust(scaled_dt, kmeans, method = "wss")
#print(ok)
kc <- as.numeric(yed %>% filter(Year == year) %>% select(KC))
cat("Number of clusters for year", year, ":", kc, "\n")
# Perform k-means clustering
set.seed(1) # Set seed for reproducibility
kmeans_result <- kmeans(scaled_dt, centers = kc)
# Add cluster assignments back to the original data
mydata_clustered <- mydata %>% filter(Year == year) %>%
mutate(Cluster = kmeans_result$cluster)
# Extract the first two principal components for plotting
pca_result <- prcomp(scaled_dt)
pc_data <- data.frame(pca_result$x[, 1:2])
pc_data <- cbind(pc_data, mydata_clustered)
# Plotting the clusters
plot_title <- paste("Cluster Analysis Result for Year", year)
p <- ggplot(pc_data, aes(x = PC1, y = PC2, color = as.factor(Cluster))) +
geom_point() +
geom_text_repel(aes(label = Region), size = 3) +
labs(title = plot_title, color = "Cluster") +
theme_minimal()
print(p)
final_data <- cbind(md, Cluster = as.factor(kmeans_result$cluster))
#print(final_data)
# Parallel coordinate plot
p2 <- ggparcoord(data = final_data, columns = c(2:13), groupColumn = "Cluster", scale = "center") + labs (x = "Variables", У = "Values", title = "Cluster Analysis based on k means")
# Make the plot interactive
ggplotly(p2)
print(p2)
a <- aggregate(year_data, by=list(cluster=as.factor(kmeans_result$cluster)), mean)
print(a)
} else {
cat("No data available for year", year, "\n")
}
}
## Number of clusters for year 2015 : 3
## cluster GCI IS APC NCI IA AW TNRS
## 1 1 120.8667 416.7000 35410.83 8800.000 21.73333 1063.433 1148.083
## 2 2 112.1000 2465.3000 245874.00 113251.000 37.70000 2256.000 4534.400
## 3 3 117.0143 481.9286 39461.43 8265.143 16.08571 1077.829 1320.457
## NRS NRSI SHPC SHLN NCLN
## 1 61.85000 60.3 45.23333 5.116667 12205.17
## 2 190.30000 182.2 55.40000 7.000000 129588.00
## 3 55.31429 54.2 38.18571 3.385714 12077.57
## Number of clusters for year 2016 : 3
## cluster GCI IS APC NCI IA AW TNRS NRS
## 1 1 109.26 510.8100 39783.4 9996.00 20.50000 1167.360 1254.49 55.85
## 2 2 108.90 553.4333 45624.0 13247.67 28.83333 1119.167 1329.40 67.60
## 3 3 121.70 2858.3000 266061.0 131654.00 42.80000 2650.600 4732.30 196.50
## NRSI SHPC SHLN NCLN
## 1 54.93000 43.81 4.630000 13817.10
## 2 65.53333 60.10 5.966667 15272.67
## 3 190.10000 57.40 7.300000 141477.00
## Number of clusters for year 2017 : 3
## cluster GCI IS APC NCI IA AW TNRS NRS
## 1 1 127.500 3169.700 280899.00 144896.0 44.5000 3181.700 4990.000 204.10
## 2 2 119.240 461.900 37614.40 10817.4 20.5200 1520.300 994.720 53.86
## 3 3 115.225 711.125 48106.75 13934.5 24.4125 1405.638 1567.612 65.75
## NRSI SHPC SHLN NCLN
## 1 197.2000 60.3000 7.1000 151742.00
## 2 52.9600 37.8800 5.4400 15579.60
## 3 64.2375 52.1375 4.7375 16842.25
## Number of clusters for year 2018 : 3
## cluster GCI IS APC NCI IA AW TNRS
## 1 1 107.9344 865.6375 53024.88 16498.12 26.3375 1825.05 1526.625
## 2 2 110.0588 607.3400 41670.80 13446.00 19.9600 1839.72 1028.680
## 3 3 123.5120 3359.9000 297347.00 158788.00 43.2000 4212.70 4598.100
## NRS NRSI SHPC SHLN NCLN
## 1 63.9375 62.25 60.975 5.2625 19015.62
## 2 52.1600 51.16 40.320 5.1600 16662.40
## 3 184.9000 178.20 60.300 8.8000 166057.00
## Number of clusters for year 2019 : 3
## cluster GCI IS APC NCI IA AW TNRS
## 1 1 110.3500 744.200 42953.00 15305.50 19.40000 2112.575 1020.175
## 2 2 111.2889 1095.422 57858.33 19603.89 24.26667 2171.344 1699.411
## 3 3 105.8000 3550.600 320189.00 175760.00 39.50000 4953.000 4471.300
## NRS NRSI SHPC SHLN NCLN
## 1 57.25000 55.37500 38.80000 4.625000 16568.00
## 2 67.76667 65.74444 60.03333 4.722222 20276.22
## 3 176.00000 166.80000 70.00000 7.900000 168112.00
## Number of clusters for year 2020 : 4
## cluster GCI IS APC NCI IA AW TNRS
## 1 1 116.4000 774.800 23537.67 9445.333 14.23333 2672.033 1015.767
## 2 2 143.1000 1255.200 81215.00 36050.000 19.40000 2181.800 1405.100
## 3 3 132.0000 4620.900 319517.00 175926.000 30.70000 5512.000 5126.000
## 4 4 111.1889 1308.944 60371.22 22400.111 20.40000 2455.278 1821.433
## NRS NRSI SHPC SHLN NCLN
## 1 64.93333 63.20000 34.90000 2.466667 7480.00
## 2 47.60000 45.00000 45.20000 3.400000 27305.00
## 3 194.00000 155.50000 58.60000 6.800000 163233.00
## 4 71.04444 69.06667 64.47778 3.133333 18173.33
## Number of clusters for year 2021 : 3
## cluster GCI IS APC NCI IA AW TNRS
## 1 1 127.1667 1200.431 52321.33 26488.0 11.67503 3552.265 1412.967
## 2 2 132.5000 4275.405 352751.00 207782.0 25.74264 6851.202 6323.400
## 3 3 115.6700 1511.046 59642.80 25168.7 16.79413 3014.503 1846.010
## NRS NRSI SHPC SHLN NCLN
## 1 61.30 57.40 30.60 2.500 17931.33
## 2 224.10 154.90 48.20 5.700 184062.00
## 3 75.49 70.11 49.94 2.303 18370.40
## Number of clusters for year 2022 : 3
## cluster GCI IS APC NCI IA AW TNRS
## 1 1 127.7102 1727.343 57252.71 29890.29 12.84286 4133.171 1944.571
## 2 2 125.1214 1541.209 67650.83 37061.33 21.36667 4175.209 1779.667
## 3 3 130.1430 5384.981 381051.00 236242.00 28.60000 9148.741 7691.900
## NRS NRSI SHPC SHLN NCLN
## 1 74.38571 68.21429 25.34286 3.757143 24126.71
## 2 77.05000 71.76667 40.86667 5.733333 27783.00
## 3 264.40000 157.60000 43.00000 11.300000 233004.00
# Load required packages
library(sf)
library(leaflet)
# Read GeoJSON file
uzbekistan_geojson <- "./uzbekistan_regional.geojson"
uzbekistan <- sf::st_read(uzbekistan_geojson)
## Reading layer `uzbekistan_regional' from data source
## `/Users/javlon/University/Big Data/FW/Independent work/uzbekistan_regional.geojson'
## using driver `GeoJSON'
## Simple feature collection with 14 features and 3 fields
## Geometry type: GEOMETRYCOLLECTION
## Dimension: XY
## Bounding box: xmin: 55.9989 ymin: 37.18212 xmax: 73.14812 ymax: 45.59051
## Geodetic CRS: WGS 84
# Replace names
uzbekistan$ADM1_EN[uzbekistan$ADM1_EN == "Kashkadarya province"] <- "Kashkadarya region"
uzbekistan$ADM1_EN[uzbekistan$ADM1_EN == "Republic of Karakalpakstan"] <- "The Republic of Karakalpakstan"
str(uzbekistan)
## Classes 'sf' and 'data.frame': 14 obs. of 4 variables:
## $ ADM1_EN : chr "Tashkent city" "Namangan region" "Tashkent region" "Fergana region" ...
## $ ADM1_RU : chr "г. Ташкент" "Наманганская область" "Ташкентская область" "Ферганская область" ...
## $ ADM1_UZ : chr "Toshkent sh." "Namangan viloyati" "Toshkent viloyati" "Fargʻona viloyati" ...
## $ geometry:sfc_GEOMETRYCOLLECTION of length 14; first list element: List of 1
## ..$ :List of 2
## .. ..$ :List of 1
## .. .. ..$ : num [1:935, 1:2] 69.1 69.1 69.1 69.1 69.1 ...
## .. ..$ :List of 1
## .. .. ..$ : num [1:88, 1:2] 69.2 69.2 69.2 69.2 69.2 ...
## .. ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## ..- attr(*, "class")= chr [1:3] "XY" "GEOMETRYCOLLECTION" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
## ..- attr(*, "names")= chr [1:3] "ADM1_EN" "ADM1_RU" "ADM1_UZ"
uzbekistan$ADM1_EN
## [1] "Tashkent city" "Namangan region"
## [3] "Tashkent region" "Fergana region"
## [5] "Andijan region" "Syrdarya region"
## [7] "Jizzakh region" "Navoi region"
## [9] "Samarkand region" "Kashkadarya region"
## [11] "Surkhandarya region" "Bukhara region"
## [13] "Khorezm region" "The Republic of Karakalpakstan"
# Subset data for the selected year
year_data_subset <- mydata %>% filter(Year == 2022) %>% select(Region)
year_data_a <- mydata %>% filter(Year == 2022) %>% select(GCI, IS, APC, NCI, IA, AW, TNRS, NRS, NRSI, SHPC, SHLN, NCLN)
# Scale the data
scaled_data <- scale(year_data_a) # Exclude Year column for scaling
# Perform k-means clustering
set.seed(1) # Set seed for reproducibility
kmeans_result <- kmeans(scaled_data, centers = 3)
# Add cluster assignments back to the original data
year_data_clustered <- year_data_subset %>%
mutate(Cluster = kmeans_result$cluster)
# Merge Digix values with the GeoJSON data
uzbekistan <- merge(uzbekistan, year_data_clustered, by.x = "ADM1_EN", by.y = "Region", all.x = TRUE)
# Convert geometry to MULTIPOLYGON
uzbekistan <- st_cast(uzbekistan, "MULTIPOLYGON")
# Define colors for each cluster
cluster_colors <- c("#e9493e", "#00bb38", "#629dff")
# Plot the map
leaflet(data = uzbekistan) %>%
addProviderTiles("OpenStreetMap.Mapnik") %>%
addPolygons(fillColor = ~cluster_colors[Cluster],
fillOpacity = 0.8,
stroke = FALSE,
label = ~paste(ADM1_EN, ": Cluster", Cluster)) %>%
addLegend("bottomright",
colors = cluster_colors,
labels = c("Cluster 1", "Cluster 2", "Cluster 3"),
title = "Clusters",
opacity = 1)
Cluster 1 comprises regions characterized by moderate development in the context of digitalization, particularly within the realm of communication and information services. The growth rate for communication and information services (GCI) in this cluster stands at 127.80%, indicating a steady but not exceptional expansion in this sector. The number of subscribers with Internet access (IS) in these regions is relatively low, at 1677.471 thousand, suggesting limited penetration of Internet services among the population. This limitation could be attributed to various factors, including inadequate infrastructure, lower levels of digital literacy, or economic constraints preventing wider access to Internet services.
Furthermore, the availability of personal computers (APC) in enterprises and organizations is moderate, with 56815.88 units reported. This level of availability indicates that while there are computers present in the business sector, their distribution is not extensive, potentially impacting productivity and digital engagement within these organizations. The number of computers connected to the Internet (NCI) is 29377.5 units, reflecting a similar trend of moderate connectivity. The share of enterprises and organizations with Internet access (IA) is relatively low at 13.46%, pointing to significant room for improvement in integrating digital tools and online connectivity in business operations. The average monthly nominal wages in the information and communication sector (AW) are 4121.696 thousand soums, which is on the lower end, suggesting that the sector might not be as financially rewarding in these regions, potentially influencing the attractiveness of careers in this field.
Regions in Cluster 2 exhibit slightly more advanced digitalization metrics compared to Cluster 1, but they still fall short of the highest standards. The growth rate for communication and information services (GCI) in this cluster is 124.47%, slightly lower than in Cluster 1, yet still indicative of a growing sector. The number of subscribers with Internet access (IS) is 1583.778 thousand, which is comparable to Cluster 1, reinforcing the notion that Internet penetration remains an area needing further development. This similarity in Internet subscription numbers suggests that these regions might face similar barriers to digital adoption as those in Cluster 1, such as infrastructural limitations or socio-economic factors.
The availability of personal computers (APC) in enterprises and organizations is higher in Cluster 2, with 70429.40 units, indicating better provision of digital tools within the business sector. This increased availability likely supports enhanced operational efficiency and digital engagement among businesses. Correspondingly, the number of computers connected to the Internet (NCI) is higher at 39316.0 units, reflecting improved connectivity and integration of digital resources. The share of enterprises and organizations with Internet access (IA) in Cluster 2 is 22.08%, demonstrating a notable improvement over Cluster 1. This higher percentage suggests that more businesses are leveraging online tools and platforms to enhance their operations and market reach. The average monthly nominal wages in the information and communication sector (AW) are slightly higher at 4201.977 thousand soums, indicating a marginally more lucrative sector, which could potentially attract more talent and investment into this field.
Cluster 3 represents the most advanced regions in terms of digitalization and development of communication and information services. The growth rate for communication and information services (GCI) in this cluster is the highest at 130.14%, highlighting robust expansion and significant investments in this sector. The number of subscribers with Internet access (IS) is markedly higher at 5384.981 thousand, indicating extensive penetration and widespread access to Internet services among the population. This high level of Internet subscription suggests well-developed infrastructure, higher digital literacy, and greater economic capability to support widespread Internet use.
The availability of personal computers (APC) in enterprises and organizations is exceptionally high at 381051.0 units, reflecting a substantial provision of digital tools within the business sector. This extensive availability supports advanced operational capabilities and fosters a high level of digital engagement and productivity. Similarly, the number of computers connected to the Internet (NCI) is the highest among all clusters at 236242.0 units, indicating comprehensive connectivity and integration of digital resources. The share of enterprises and organizations with Internet access (IA) is also the highest at 28.60%, demonstrating widespread adoption of online tools and platforms, which likely enhances business operations and competitiveness. The average monthly nominal wages in the information and communication sector (AW) are significantly higher at 9148.741 thousand soums, making this sector highly lucrative and likely attracting top talent and substantial investment. This high wage level suggests a vibrant and dynamic sector that is central to the economic activities in these regions, further driving digital advancement and economic growth.
Li, M.; Zhang, L.; Zhang, Z. Impact of Digital Economy on Inter-Regional Trade: An Empirical Analysis in China. Sustainability 2023, 15, 12086. https://doi.org/10.3390/su151512086↩︎
Abeliansky, A.L.; Hilbert, M. Digital technology and international trade: Is it the quantity of subscriptions or the quality of data speed that matters? Telecommun. Policy 2017, 41, 35–48.↩︎
He, S.; Zhao, J.; Zhang, R. Development level of digital economy, trade costs and trade in value-added. Int. Econ. Trade Res. 2021, 11, 4–19.↩︎
Baltagi, B. H. (2005). Econometric analysis of panel data (Vol. 3). John Wiley & Sons.↩︎
Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data (2nd ed.). MIT press. [2]↩︎
Arellano, M., & Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence. Review of Economic Studies, 58(2), 277-297. [3]↩︎